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Abstract 

Although exchange of genetic information by recombination plays an important role in the evolution of 
viruses, it is not clear how it generates diversity. Understanding recombination events helps with the study 
of the evolution of new virus strains or new viruses. Geminiviruses are plant viruses which have ambisense 
single-stranded circular DNA genomes and are one of the most economically important plant viruses in agri- 
cultural production. Small circular single-stranded DNA satellites, termed DNA-/3, have recently been found 
to be associated with some geminivirus infections. In this paper we analyze several DNA-/3 sequences of 
^ geminiviruses for recombination events using phylogenetic and statistical analysis and we find that one strain 

from ToLCMaB has a recombination pattern and is a recombinant molecule between two strains from two 
D species, PaLCuB-[IN:Chi:05] (major parent) and ToLCB-[IN:CP:04] (minor parent). We propose that this re- 

combination event contributed to the evolution of the strain of ToLCMaB in South India. The Hidden Markov 
Chain (HMM) method developed by Wedd et al estimating phylogenetic tree through out the whole alignment 
provide us a recombination history of these DNA-/3 strains. It is the first time that this statistic method has 
been used on DNA-/3 recombination study and give a clear recombination history of DNA-/3 recombination. 

Oh 

d 1 Introduction 
• ^ 

^ Geminiviruses are emerging as one of the most economically important plant viruses in agricultural produc- 
'— ' tion [dl 121 [361. Begomovirus is the largest genus of the family of Geminiviridae and is phylogenetically and 
psj geographically divided into two groups; the Old World viruses and the New World viruses. The new world 
^ begomovirus consists of two viral genomes, DNA-A and DNA-B, while most of the Old World begomovirus 
just has one partite DNA-A About a decade ago, a satellite molecule called DNA-/3 was found to associate 
with some of the old world geminivirus [|6ll28l. 
^ DNA-/3 has a genome approximately 1.3-1.5kb long, and depends on the helper virus DNA-A for its repli- 
^ cation, movement and transmission [l6llll|28|. It is grouped into sub-viral agents by the International Committee 
O on Taxonomy of Viruses (ICTV). The most typical plant symptoms caused by geminivirus are due to an associ- 
^ ation of DNA-/3 with DNA-A, whereas DNA-A alone does not lead to severe damage to crops fSllH. CI gene 
encoded by DNA-/3 were found to suppress host defense systems [|8l and modulate host development [f35ll . and 
^ was believed to be one of the determining factors for geminivirus-induced disease symptom development [5]. 
^ DNA-/3 has not been found in the New World (North American and South American continents) and is 
believed to be associated with Old World begomoviruses after the geographical divergence of "Old" and "New" 
continents [[TSl . Although DNA-/3 has relatively a large range of its selection on different species of the helper 
virus DNA-A ifTTl . it is proposed to co-evolve with the DNA-A component [|5l. 

Recombination plays an important role in geminivirus [|T5ll and DNA-/3 evolution ^ [161. A fragment of 
DNA-/3 genome infecting tomato was reported to migrate to cotton via recombination with other adaptive DNA- 
(3 molecules [3], indicating the role of a recombination event in evolution of DNA-/3 molecules. 
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Because of the important role of recombination in DNA-/5 evolution, analysis on recombination events of 
DNA-/3 becomes specially important for understanding this viral evolution and disease epidemic as well as 
development of potential control strategies. 

In this paper, we apply a statistical phylogenetic analysis using a Bayesian stochastic method to infer changes 
in phylogeny along multiple sequence alignments while accounting for rate heterogeneity developed by [[33l to 
estimate potential recombination spots of DNA-/3. It is the first time that this statistic method has been used on 
DNA-beta recombination study and give a clear recombination history of DNA-beta recombination. In order 
to confirm our results, we also apply a statistical phylogenetic method developed by [|21 , | to the same data sets. 
We find that the results with the method in [[33| and with the method in [l2T1l are very similar to each other. 
One strain of Tomato leaf curl Maharastra betasatellite (ToLCMaB) has a recombination pattern and is possibly 
recombinant molecule between two strains from two distinct species. Papaya leaf curl betasatellite (PaLCuB) 
and Tomato leaf curl betasatellite (ToLCB), PaLCuB-[IN:Chi:05] (major parent) and ToLCB-[IN:CP:04] (minor 
parent). This recombination event may contribute to the evolution of Tomato leaf curl Maharastra betasatellite. 

2 Data set 

A proposed taxonomy of DNA-/3 using 78% nucleotide sequence identity as demarcation threshold was accepted 
and widely used for distinguishing species from strains of DNA-/3 O. This resulted in about 51 distinct species 
of DNA-/3 associated with begomoviruses. 

Tomato leaf curl disease (ToLCD) is caused by begomoviruses associated with betasatellites. A recent 
report showed that different species of DNA-/3 associated with ToLCD in India are geographically isolated and 
distributed [|30ll . The DNA-/3 molecules in southern and central India are more closely related to each other than 
those in northern India. 

To observe potential recombination events among these geographically related DNA-/3 species, we chose 
four strains from four distinct species of DNA-/3 associated with ToLCD in India. Among the four strains, 
ToLCBDB-[IN;Luk;05] (taxon-0) and ToLCB- [PK;RYK;97] (taxon-1) are from northern India, while PaLCuB- 
[IN;Chi;05] (taxon-2) and ToLCMaB- [IN;Pun;04] (taxon-3) are from southern India. In the same report as well 
as another report [|22|. species of ToLCBDB and ToLCB are closely related in phylogenetic tree, while PaLCuB 
and ToLCMaB are sisters (neighbors). 

Another ToLCD associated DNA-/3 from Indonesia (taxon-4) was chosen as an out group. Other five species 
of non-ToLCD related DNA-/3 from eastern Asia and southeastern Asia (taxa-5, 6, 7, 8, and 9) were also chosen 
for the out group. See Table[T]for details. 

3 Materials and Methods 

First, a data set of ten DNA-/3 genome sequences in .fasta format was aligned using clustalw-multialign software 
with the following parameters: (Gap opening penalty 10.0, gap extension penalty 0.2, gap separation penalty 
range 8, DNA weight matrix: lUB) [f32l|. 

To analyze recombination for DNA-/3 from geminiviruses, we used the software package from [|33l . In this 
method they applied a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence 
alignments while accounting for rate heterogeneity. Under the HMM, the hidden states are all possible unrooted 
tree topologies with the number of leaves n fixed along each site. The observed state space is {A, C,G,T, —}. 
Under the evolutionary model, the evolution of homologous DNA/RNA sequences (or protein-coding sequences 
where the state space is of size 61) can be described by continuous time Markov chains on a phylogenetic 
tree. A continuous time Markov chain is characterized by a substitution rate matrix, and the phylogenetic tree 
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summarizes the relationships between the species in terms of edge lengths (times since divergence) and common 
ancestors. The DNA sequences are only observed in the leaves, and information on the phylogenetic tree, 
substitution events (time and type) and edge lengths is missing. The transition matrix P(t) for a continuous time 
Markov process can be written as exp^Qt), where Q is a parametrized substitution rate matrix which determines 
the Markov process. In this method the evolutionary model was set as Hasegawa-Kishino-Yano (HKY) model 
ffH. 

The rate matrix Q under HKY model is written as the following: Let S = {A,C,G,T} and let tTq, a E 
E, J2a = 1) denote the stationary distribution of the Markov chain. This distribution can be estimated from 
the nucleotide frequencies in a single sequence. HKY model has substitution rate matrix 
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where the diagonal elements are such that each row sums to zero and the two unknown parameters are a and /3. 
The software from [|33l estimates the posterior distribution using Monte Carlo Markov Chain (MCMC) method 
under the HMM and then it outputs each tree topology with its posterior probability along each site (see ll33l for 
details). 

We have used HKY model for phylogenetic analysis on our data sets in this paper, since the HMM software in 
[|33ll uses HKY model. Also note that we have used the generalized time reversible (GTR) + gamma + invariant 
model, which is within the 95% confidence interval computed via Akaike's information criteria (AIC) in the 
software j Model Test [fT2ll25ll . to reconstructing a ML tree and the ML tree under the GTR+gamma+invariant 
model has the same tree topology as the ML tree under HKY model in Fig. |5]as well as the consensus tree under 
HKY model in Fig. |4} 

The generated alignment file in phylip format was put in to the HMM software ^33\ using the command 
"Java -jar ST-HMM.jar" with the following parameter (iterations: 50000, burn-in: 25000, rates: 0.001, 0.003, 
0.01, 0.03, 0.1, 0.3, 1.0, 3.0, 10.0, 100.0, lambda: 5, kappa: 2.0, tuningpar 0.4). Command "java -jar STHMM- 
Posterior.jar" was used to summarize the posterior distribution, and trees with posterior probability above 0.05 
were selected using the command "java -jar TreeSummary.jar". The region 1-1000 nucleotide (nt) was found 
to have a clear pattern of recombination, while the region 1000-1505 nt seems to have a massive pattern of tree 
probability. 

In order to apply phylogenetic analysis to the sequences of 1-1505 nt and 1000-1505 nt of the 10 viral 
sequences after aligning with the clustalw-multialign software into nexus format, we estimated the posterior 
distribution under the generalized time reversible (GTR) -i- T model and HKY model, and we estimated the 
maximum likelihood estimators. First we applied a software MrBayes [7] to analyze the split of different taxa 
on the most consensus tree under the GTR -i- T and HKY models. 647300 generations were sampled for 1-1505 
nt alignment, while 3600000 generations were sampled for 1000-1505 nt alignment. The first 25% of the data 
was bum-in. We ran four Markov chains for each model. We followed the recommendation of MrBayes which 
suggests running the chains until the standard deviation of the chains' split frequencies is less than 0.01. 

In addition, to verify our results we applied the software RDP3 [21 J to the same data sets. Sequence align- 
ment in phylip format was used as input for RDP 3 . Parameters were set to default used by RDP 3 . In the software 
RDF 3 they have implemented several different methods to find recombination sites, RDP [IT9l, GeneConv [[24|. 
BootScan [^, MaxChi [31], Chimaera [|26l, SIScan flm, and 3Seq [4J. 

The software RDP takes basically three steps: First they discard non-informative sites from the input data 
sets and then for every triplet of taxa, {A, B, C}, from the data set, choose the sister A and B. Second, they use 
a window of user-defined width moved among the aligned sub-sequences one nucleotide at a time and take an 
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average percentage identifying each of the three possible sequence pairs among {A, B, C} at the each position. 
Third, the probability that the nucleotide arrangement in the identified region that results mA,B appearing more 
closely related to C may have occurred by chance is computed using a binomial distribution. 

The software GeneConv is based on an earlier statistical approach for detecting gene conversion [|29ll . They 
use the term fragment for an aligned or homologous pair of segments in the input alignment. In the process, the 
highest- scoring fragments in the given alignment are listed and assigned p- values based on the assumption of a 
random distribution of polymorphic sites. They assign scores as follows: First, all sites that are monomorphic 
in the alignment are discarded so that only polymorphic sites are considered. Secondly, for a given pair of 
sequences, matching bases are scored as +1 and mismatches as — m, where m depends on the pair of sequences. 
Fragments are assigned p- values similar to the BLAST procedure [|2l[T4l. This p-value is an approximation of 
the proportion of permutations of the polymorphic sites for which that pair of sequences has some fragment with 
the observed score or larger [29|. 

The software Boot Scan takes two phases: "Scanning phase" and "Detection phase". In "Scanning phase" 
first they discard non-informative sites from the input data sets and in each window of user-defined width move 
among the given aligned sequences. It makes bootstrap samples and compute rooted UPGMAs by definition 
rooted or mid-pointed neighbor-joining (NJ) trees. In "Detection phase" every combination of triplets is indi- 
vidually examined for bootstrap evidence that one of the sequences may be alternatively more closely related to 
each of the other two sequences at different positions along its length. The probability that the pattern of sites 
within a potential recombinant region could have occurred by a chance distribution of mutations is approximated 
using a Bonferroni corrected version of the binomial distribution. 

The software MaxChi considers only polymorphic sites: For a given position of the moving window on the 
input sequence alignment and for a given pair of sequences, a chi-square statistic is computed to compare two 
proportions: the proportion of sites at which the sequences agree in the left half- window and the proportion of 
sites at which the sequences agree in the right half-window. Discordance between these two proportions may 
reflect a recombination event in the history of the two sequences. The maximum chi-square over all sequence 
pairs is recorded as a summary of the evidence for recombination at the window center. Significance of observed 
chi-square statistics is assessed by a Monte Carlo permutation test. 

The software Chimaera is also a modification of Maynard Smith's maximum method ['341 with only 
variable sites. The statistic is the maximum in the original alignment. The p-value equals the number of 
times the original statistic is smaller than the statistic from permuted alignments divided by the number of 
permutations. For all calculations, a sliding window was used, with the width of the window set to the number 
of polymorphic sites divided by 1.5. This window moves in steps of one nucleotide at a time. 

The software SI Scan uses a similar idea as algorithms implemented in MaxChi and Chimaera, but 
instead of using contingency tables they use Gaussian distribution and use Z-score to compute the p-value. 

The software 3Seq is similar to RDP: 3Seq discards non-informative sites from the input data sets and then 
for every triplet of taxa, {A, B, C}, from the data set, it chooses the sister A and B: two parent sequences that 
may have recombined, with one or two breakpoints, to form the third sequence (the child sequence). Excess 
similarity of the child sequence to a candidate recombinant of the parents is a sign of recombination; they take 
the maximum value of this excess similarity as the test statistic. Then they rapidly calculate the distribution of 
the excess similarity and using this method they estimate the p-value. 

4 Results 

The most consensus trees found with the 1-1505 nt and 1000-1505 nt alignment were the same as the most 
dominant tree found with the HMM software (the pink tree in Fig. [T]). 
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Fig. 1: This is the tree topology written in pink (series 14) in Fig. |2j This is an unrooted tree. This is the 
most likely tree topology from position 1 to 140 and position 300 to 1000. The software from [|33l and RDP3 
[I2TII indicate a potential recombination event among taxa 0, 1,2, and 3 in the red rectangle. Also the ML tree 
estimated by the software PHYML has the same tree topology under HKY model as well as the consensus tree 
estimated by the software MrBayes under HKY and GTR + T. 
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Fig. 2: The figure shows an estimated probability of each tree topology along each site computed using the 
software from f33\. The label of "Series for z = 1, • • • , 17 in the figure represents each different tree topology. 
The y-axis represents the probability for each tree topology and the x-axis represents position number. The tree 
written in pink is in Fig. [T]and the tree written in the dark blue dominating from position 140 to 300 is in Fig. [3] 



6 



I ° 

I 1 

I 3 

I ^ 

I ^ 

I 5 

I ^ 

I 7 

I ^ 

9 

Fig. 3: The tree written in the dark blue (series 1) in Fig. [2j This is an unrooted tree. This is the most likely tree 
topology from position 140 to 300. The software from [33] and RDP3 [2T| indicate a potential recombination 
event among taxa 0, 1,2, and 3 in the red rectangle. 

Then we estimated the maximum likelihood (ML) tree from the whole alignment (including position 1 
through position 1505). Next we infer phylogenetic tree using maximum likelihoods method, using PHYML 
v3 . software [12J, with all settings default, namely the evolutionary model is HKY model, the tree topol- 
ogy search operation method is Nearest Neighbor Interchange (NNI), and the starting tree was computed using 
BIONJ filOJ . To analyze the splits of different taxa on the ML tree we applied bootstrapping on the columns of 
each alignment with the bootstrap sample size 1000. The ML tree found with the 1-1505 nt alignment was the 
same as the most dominant tree found with the HMM software (the pink tree in Fig. [T]). 

From position 1 to position 141 and from position 312 to position 1000, the tree topology in Fig. [T]has almost 
probabihty 1.0 (see Fig. [2]). Note that the estimated ML tree and the estimated consensus tree reconstructed with 
the whole sequences from an estimated posterior distribution have the same tree topology. However, from 
position 141 to position 31 1 in the alignment, the tree topology in Fig. [3]has almost probability 1.0 (see Fig. |2]). 
The Robinson-Foulds (RF) distance [ [27l between the tree topology in Fig. |3]and tree topology in Fig. [T]is 6. 
Note that the largest possible RF distance for trees with n taxa is 2n — 6 which is 14 in our case (the normalized 
RF distance between these tree topologies is 0.43). Thus we do not think this happened because of the low 
support of a split but this seems to indicate strongly that around position 142 and position 311 there are possible 
recombination sites. 

In order to compute the support for each split we have also computed the consensus tree using the software 
MrBayes (Fig. |4]) and the ML tree using PHYML (Fig. [5]). For the consensus tree we used the posterior 
distribution and for the ML tree we use the bootstrap with the sample size 1000 to compute the support for each 
split. They have the same tree topology as the tree in Fig. [T]and the support for each split in the ML tree and the 
consensus tree has very high probability. Especially, the probability of each split on the consensus tree estimated 
with the whole sequences under HKY is 1.0 (100%). (Even though one of the splits on the ML tree reconstructed 
with the whole sequences under HKY has about 90% of its support all other splits have strong support (Fig. [5]).) 

The mutation rates along each site are also estimated by the software from [f33ll and it seems that the mutation 
rates are between 0.1 and 0.3 (Fig. |6]). 

RDP3 estimated a similar recombination event, where a small genome fragment of ToLCMaB-[IN;Pun;04] 
(taxon-3) (position 142-311 in alignment) is migrated from ToLCB-[PK;RYK;97] (taxon-1), as circled by red 
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Fig. 4: The consensus tree estimated by the software MrBaye s under HKY from the whole alignment (including 
position 1 through position 1505). This is an unrooted tree. The number in each split represent the probability 
of the split. The consensus tree estimated under the GTR + T also has the same tree topology but it has smaller 
probabilities of some splits. Note that the tree topology of the consensus tree is the same as the tree topology of 
the ML tree in Fig. [5] and the tree topology in Fig. [T] 



Fig. 5: The ML tree estimated by the software PHYML under HKY model from the whole alignment (including 
position 1 through position 1505). This is an unrooted tree. The number in each split represents the probability 
of the split estimated by bootstrapping with the bootstrap sample size 1000. Note that the tree topology of the 
ML tree is the same as the tree topology of the consensus tree in Fig. |4]and the tree topology in Fig. [T] 



8 



1.2 
1 
C.B 
0.6 
0.4 
0.2 



It 



1 



oLuy 



rmrp' 




^cnm!BfNr^^i^oLnCTi^lBmr^rNiD^Lno^(jimot)fNr--THiDoincn^3-lM 
^^fNfNmm^^^LnLniDi^r^r^MOOtjiCTiCTioo^THCNCwmmmij-u- 



-0.001 

-0.003 

-0.01 

-0.03 

-0.1 

-0.3 

-1 

-3 

-10 

-100 



Fig. 6: The figure shows an estimated probability of each mutation rate along each site computed using the 
software from [l33l|. The y-axis represents the probability for each mutation rate and the x-axis represents position 
number. It shows that the most common rates are 0.1 and 0.3. 



rectangle in Fig. |7} RDP3 uses multiple methods for recombination estimation, and the average p-value from 
different methods are listed below (Table |2]). 
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Fig. 7: RDP3 infers the same 10 taxa alignment used in our study. Red rectangles indicate the same event 
inferred by a HMM method from ll33ll . 



5 Conclusion 

We first reported a potential recombination event between taxa 1, 2, and 3, indicating that the strain ToLCMaB- 
[IN;Pun;04] (taxon-3) from ToLCMaB is a recombinant of two strains from two different species, ToLCB- 
[PK;RYK;97] (taxon-1) and PaLCuB-[IN;Chi;05] (taxon-2). As one study reported, ToLCMaB- [IN;Pun;04] 
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method 


events 


av. p-val 


RDP 


1 


1.962 • 10-13 


GENECONV 


1 


2.158 ■ 10-9 


BootScan 


1 


2.073 ■ 10-1^ 


MaxChi 


1 


7.397- 10-s 


Chimaera 


1 


2.830 ■ 10-9 


3Seq 


1 


4.410 ■ 10-2 



Table 2: Average p- value from different methods in RDP 3 [1211 inferring the recombination event between 
ToLCMaB-[IN;Pun;04] and ToLCB-[PK;RYK;97] at position of 142-311. We used RDP [19J, GENECONV 
[[24], BootScan [20J, MaxChi [jSTI, Chimaera [|26l. and 3Seq [jH. We set parameters for each software as fol- 
lows; RDP: Reference sequence:no; window size:30; Detect recombination between sequence identity: 0%- 
100%; GENECONV: Sequence option: Treat blocs as one polymorphism; G-scale: 1 ; Max number of global frags 
listed per sequence pair: 2000; Max. number of pairwise frags listed per sequence pair:0; Min. aligned frag- 
ment lenghth :1; Min polymorphisms in frags:2; Min. pairwise frag score:2; Max. number overlapping frags: 1; 
Boot scan: window size:200; step size:20; use distances; number of bootstraps replicates: 100; Random num- 
ber seed:3; cutoff percentage:70; transversion rate ratio: 0.5; coefficient of variation:l; MaxChi: Window 
size:70; Gaps: no; Chimaera: Window size:60; and 3Seq: Sequences are circular; Highest acceptable P- 
value:0.05; Bonferroni correction; Number of permutations:©; use SEQGEN parametric simulations; 

(taxon-3) and PaLCuB-[IN;Chi;05] (taxon-2) are closely related in their phylogeny compared to other species 
[[30l[ . Our study showed that ToLCMaB-[IN;Pun;04] (taxon-3) shares sequence identity mainly with PaLCuB- 
[IN;Chi;05] (taxon-2), while a small portion of its genome (position 141 nt to 312 nt in the alignment) is poten- 
tially migrated from another species, ToLCB-[PK;RYK;97] (taxon-1). 

Our results indicate a recombination event happened between a northern India DNA-/3 strain ToLCB-[PK;RYK;97] 
(taxon-1) and a southern India DNA-/3 strain PaLCuB-[IN;Chi;05] (taxon-2), resulting a new strain ToLCMaB- 
[IN;Pun;04] (taxon-3) which was found in southern India. Different geographic locations provide different 
physiology of host, weather conditions, helper viruses, and so on. The phylogenetic relationship among ToLCB- 
[PK;RYK;97] (taxon-1), PaLCuB-[IN;Chi;05] (taxon-2), and ToLCMaB-[IN;Pun;04] (taxon-3) coincides with 
their distinct geographic relationship, suggesting that different genetic information on the viral genome from 
northern India or southern India may already adapt to their geographic distribution (Fig. [8]). However, although 
the recombination event lead to the possible emergence of a new strain in a different epidemic location in India, 
it still has a stronger relationship within its parents geographically and phlegmatically than other strains which 
are epidemic in other Asian countries. 

(3Cl protein, product of the CI gene, can alter leaf development and suppress plant defense systems during 
infection [l8l[35l. The recombination happened in approximate 100-220 nt of the genome (141-312 in alignment), 
which partially covers the C-Terminal of CI gene on the beta-satellite. ToLCB-[PK;RYK;97] (taxon-1) (3Cl has 
118 amino acids, while (3Cl of PaLCuB-[IN;Chi;05] (taxon-2) has 122 amino acids. The recombination event 
leads to a (3C\ protein of ToLCMaB-[IN;Pun;04] (taxon-3) with 118 amino acids, missing the 6 amino acids 
from major partent PaLCuB-[IN;Chi;05] (taxon-2) on the C-terminal of (3Cl, instead having 2 amino acids from 
C-terminal of (3Cl on minor parent ToLCB-[PK;RYK;97] (taxon-1). Although functions of different domains of 
/SCI were unknown, the recombination on C-terminal of (3Cl might modulate its function involving in virus-host 
interaction. 

DNA-/3 was known to be capable to adapt to a new helper virus from distinct geographic location by modi- 
fying its genome [|23l . The genetic modification on this southern Indian DNA-/3 strain ToLCMaB-[IN;Pun;04] 
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Fig. 8: The geographic distribution of four betasatellites, ToLCBDB-[IN;Luk;05], ToLCB-[PK;RYK;97], 
PaLCuB-[IN;Chi;05] and ToLCMaB-[IN;Pun;04], associated with ToLCD in the India sub-continent. 
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(taxon-3) via a recombination event may contribute to the fitness of this DNA-^ strain on its host. 

6 Discussion 

The advantage of our study is that estimating of phylogenetic tree through out the alignment by HMM method 
provide a clear history of DNA-beta recombination. It is the first time that researches on DNA-beta recombina- 
tion use such statistic method and give this clear recombination history. 

Our study also provides a way to understand DNA virus evolution through recombination events. From our 
results, it is likely that the specie of ToLCMaB is a result of recombination from two different species, namely 
ToLCB and PaLCuB. Such recombination event contributed to the occurrence of new DNA-/3 species as well 
as the evolution of DNA-^. By providing the recombination history together with geographic information, we 
could link the phylogeny information to the geographic information of DNA-beta strains, thus help us understand 
evolution and epidemic of the virus. 
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